Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix synchronization issue when writing string columns with dictionary to ORC #14595

Merged

Conversation

vuule
Copy link
Contributor

@vuule vuule commented Dec 7, 2023

Description

Changes in #14295 introduced a synchronization issue in build_dictionaries. After stripe_dicts are initialized on the host, we copy them to the device and then launch kernels that read the dicts (device copy). However, after these kernels we deallocate buffers that are not longer needed and clear the dicts' views to these buffers on the host. The problem is that, without synchronization after the H2D copy, the host modification can be done before the H2D copy is performed, and we run the kernels with the altered state.
This PR adds a sync point to make sure the copy is done before host-side modification.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@vuule vuule added bug Something isn't working non-breaking Non-breaking change labels Dec 7, 2023
@vuule vuule self-assigned this Dec 7, 2023
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Dec 7, 2023
@vuule vuule changed the base branch from branch-24.02 to branch-23.12 December 7, 2023 20:10
@GregoryKimball
Copy link
Contributor

GregoryKimball commented Dec 7, 2023

@abellina Would you please confirm that disabling the ORC dict sorting addresses the query error? I believe this confirmation should be prerequisite to merging.

@jlowe
Copy link
Contributor

jlowe commented Dec 7, 2023

@abellina and I verified that disabling the dictionary sorting did not resolve the issue. We're currently digging into other parts of the associated PR that were not specific to the sorted dictionary path.

@vuule vuule changed the title Turn off dictionary sorting by default in ORC writer Fix synchronization issue when writing string columns with dictionary to ORC Dec 7, 2023
@vuule vuule marked this pull request as ready for review December 7, 2023 21:49
@vuule vuule requested a review from a team as a code owner December 7, 2023 21:49
@vuule vuule requested review from harrism and hyperbolic2346 and removed request for a team December 7, 2023 21:49
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me.

cpp/src/io/orc/writer_impl.cu Outdated Show resolved Hide resolved
@raydouglass raydouglass merged commit fd2f6a6 into rapidsai:branch-23.12 Dec 8, 2023
64 of 65 checks passed
karthikeyann pushed a commit to karthikeyann/cudf that referenced this pull request Dec 12, 2023
… to ORC (rapidsai#14595)

Changes in rapidsai#14295 introduced a synchronization issue in `build_dictionaries`. After stripe_dicts are initialized on the host, we copy them to the device and then launch kernels that read the dicts (device copy). However, after these kernels we deallocate buffers that are not longer needed and clear the dicts' views to these buffers on the host. The problem is that, without synchronization after the H2D copy, the host modification can be done before the H2D copy is performed, and we run the kernels with the altered state.
This PR adds a sync point to make sure the copy is done before host-side modification.

Authors:
   - Vukasin Milovanovic (https://github.com/vuule)

Approvers:
   - Nghia Truong (https://github.com/ttnghia)
   - Alessandro Bellina (https://github.com/abellina)
   - Bradley Dice (https://github.com/bdice)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants